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Abstract 

Similarity measurements between 3 D objects and 2D images are useful for the tasks of 
object recognition and classification. We distinguish between two types of similarity metrics: 
metrics computed in image-space (image metrics ) and metrics computed in transformation- 
space (transformation metrics ). Existing methods typically use image metrics; namely, metrics 
that measure the difference in the image between the observed image and the nearest view 
of the object. Example for such a measure is the Euclidean distance between feature points 
in the image and their corresponding points in the nearest view. (Computing this measure is 
equivalent to solving the exterior orientation calibration problem.) In this paper we introduce a 
different type of metrics: transformation metrics. These metrics penalize for the deformations 
applied to the object to produce the observed image. 

We present a transformation metric that optimally penalizes for “affine deformations” under 
weak-perspective. A closed-form solution, together with the nearest view according to this 
metric, are derived. The metric is shown to be equivalent to the Euclidean image metric, in 
the sense that they bound each other from both above and below. For the Euclidean image 
metric we offer a sub-optimal closed-form solution and an iterative scheme to compute the exact 
solution. 
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1 Introduction 


Object recognition is a process of selecting the object model that best matches the observed 
image. A common approach to recognition uses features (such as points or edges) to rep¬ 
resent objects. An object is recognized in this approach if there exists a viewpoint from 
which the model features coincide with the corresponding image features, e.g. [Roberts, 1965, 
Fischler and Bolles, 1981, Lowe, 1985, Huttenlocher and Ullman, 1987, Basri and Ullman, 1988, 
Thompson and Mundy, 1987, Ullman and Basri, 1991]. Since images often are noisy and mod¬ 
els occasionally are imperfect, it is rarely the case that a model aligns perfectly with the image. 
Systems therefore look for a model that “reasonably” aligns with the image. Consequently, 
measures that assess the quality of a match become necessary. 

Similarity measures between 3 D objects and 2D images are needed for a range of applica¬ 
tions: 

• The recognition of specific objects in noisy images, as described above. 

• The initial classification of novel objects. In this application a new object is associated to 
similar objects in the database. This way an image of, e.g., a Victorian chair is associated 
with models of (different) familiar chairs. 

• The recognition of non-rigid objects whose geometry is not fully specified. An example 
is the recognition of 3 D hand gestures. In this task only the generic shape of the gesture 
is known, and the particular instances differ according to the specific physiology of the 
hand. 

Existing recognition methods are usually tailored to solve the first of these application, namely, 
the recognition of specific objects from noisy images. Many of these methods are sub-optimal 
(see Section 2 for a review), which may result in large number of either mis-recognition or 
false-positives. When these methods are extended to handle problems such as classification and 
recognition of non-rigid objects their performance may even be less predictable. The general 
problem of recognition therefore requires measures that provide a robust assessment of the 
similarity between objects and images. In this paper we describe two such measures, and 
develop a rigorous solution to the minimization problem that each measure entails. 

A common measure for comparing 3 D objects to 2D images is the Euclidean distance be¬ 
tween feature points in the actual image and their corresponding points in the nearest view of 
the object. The assumption underlying this measure is that images are significantly less reliable 
than models, and so perturbations should be measured in the image plane. This assumption 
often suits recognition tasks. Other measures may better suit different assumptions. For exam¬ 
ple, when classifying objects, there is an inherent uncertainty in the structure of the classified 
object. One may therefore attempt to minimize the amount of deformations applied to the 
object to account for this uncertainty. Such a distance is measured in transformation space 
rather than in image space. A definition of these two types of measures is given in Section 3. 
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Measures to compare 3 D models and 2D images generally are desired to have metrical 
properties; that is, they should monotonically increase with the difference between the measured 
entities. (A more exact definition is given in Appendix A.) The Euclidean distance between 
the image and the nearest view defines a metric. (We refer to this measure as the image 
metric.) The difficulty with employing this measure is that a closed-form solution to the 
problem has not yet been found, and therefore currently numerical methods must be employed 
to compute the measure. A common method to achieve a closed-form metric is to extend the 
set of transformations that objects are allowed to undergo from the rigid to the affine one. The 
problem with this measure is that it bounds the rigid measure from below, but not from above. 
Other methods either achieve only sub-optimal distances, or they do not define a metric. The 
existing approaches are reviewed in Section 2. 

This paper presents a closed-form distance metric to compare 3 D models and 2D images. 
The metric penalizes for the non-rigidities induced by the optimal affine transformation that 
aligns the model to the image under weak-perspective projection. The metric is shown to bound 
the least-square distance between the model and the image both from above and below. We 
foresee three ways to use the metric developed in this paper: 

1. Obtain a direct assessment of the similarity between 3D models and 2D images. 

2. Obtain lower and upper bounds on the image metric. In many cases such bounds may 
suffice to unequivocally determine the identity of the observed object. 

3. Provide an initial guess to be then used by a numerical procedure to solve the image 
distance. 

The rest of this paper is organized as follows: In Section 2 we review related work. In 
Section 3 we define the concepts used in this paper. In Section 4 we summarize the main 
results of this paper. These results are discussed in detail and proved in section 5 for the 
transformation metric , and section 6 for the image metric. Sections 5 and 6 can be omitted 
in first reading. Finally, in Section 7 we compare the distances between 3D objects and 2D 
images, obtained by alignment, to our results. 


2 Previous approaches 

Previous approaches to the problem of model and image comparison using point features are 
divided into three major categories: 

1. Least-square minimization in image space. 

2. Sub-optimal methods using correspondence subsets. 

3. Invariant functions. 
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The traditional photometric approach to the problem of model and image comparison in¬ 
volves retrieving a view of the object that minimizes the least-square distance to the image. 
This problem is referred to as the exterior orientation calibration problem, (or the recovery of the 
hand-eye transform ) and is defined as follows. Given a set of n 3 D points (model points) and a 
corresponding set of n 2D points (image points), find the rigid transformation that minimizes 
the distance in the image plane between the transformed model points and the image points. 
An analytic solution to this problem has not yet been found. (Analytic solutions to the absolute 
orientation problem , the least-square distance between pairs of 3 D objects, have been found, 
see [Horn, 1987, Horn, 1991]. An analytic solution to the least-square distance between pairs 
of 2D images has not yet been found.) Consequently, numerical methods are employed (see 
reviews in [Tsai, 1987, Yuan, 1989]). Such solutions often suffer from stability problems, they 
are computationally intensive and require a good initial guess. 

To avoid using numerical methods, frequently the object is allowed to undergo affine trans¬ 
formations instead of just rigid ones. Affine transformations are composed of general linear 
transformations (rather than rotations) and translations, and they include in addition to the 
rigid transformations also reflection, stretch, and shear. The solution in the affine case is sim¬ 
pler than that of the rigid case because the quadratic constraints imposed in the rigid case are 
not taken into account, enabling the construction of a closed-form solution. At least six points 
are required to find an affine solution under perspective projection [Fischler and Bolles, 1981], 
and four are required under orthographic projection [Ullman and Basri, 1991]. 

The affine measure bounds the rigid measure from below. The rigid measure, however, is 
not bounded from above, as is demonstrated by the following example. Consider the case of 
matching four model points to four image points under weak-perspective. Since in this case 
there always exists a unique affine solution, the affine distance between the model and the image 
is zero. On the other hand, since three points uniquely determine the rigid transformation that 
aligns the model to the image, by perturbating one point we can increase the rigid distance 
unboundedly. 

A second approach to comparing models to images involves the selection of a small sub¬ 
set of correspondences ( alignment key), solving for the transformation using this subset, and 
then transforming the other points and measuring their distance from the corresponding im¬ 
age points. Three [Fischler and Bolles, 1981, Rives et al., 1981, Haralick et al., 1991] or four 
[Horaud et al., 1989] points are required under perspective projection, and three points un¬ 
der weak perspective [Ullman, 1989, Huttenlocher and Ullman, 1987] . The obtained distance 
critically depends on the choice of alignment key. Different choices produce different distance 
measures between the model and the image. The results almost always are sub-optimal, since 
it is generally better to match all points with small errors than to exactly match a subset of 
points and project all the errors onto the others. 

A third approach involves the application of invariant functions. Such functions return a con¬ 
stant value when applied to any image of a particular model. Invariant functions were success¬ 
fully used only with special kinds of models, such as planar objects (e.g., [Lamdan et al., 1987, 
Forsyth et al., 1991]). More general objects can be recognized using model-based invariant 
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functions [Weinshall, 1993]. For noise-free data, model-based invariant functions return zero if 
the image is an exact instance of the object. To account for noise, the output of these functions 
usually is required to be below some fixed threshold. In general, very little research has been 
conducted to characterize the behavior of these functions when the model and the image do 
not perfectly align. The result of thresholding therefore becomes arbitrary. 


3 Definitions and notation 

In the following discussion, we assume weak-perspective projection. Namely, the object under¬ 
goes a 3 D transformation that includes rotation, translation, and scale, and is then orthograph- 
ically projected onto the image. Perspective distortions are not accounted for and treated as 
noise. 

In order to define a similarity measure for comparing 3D objects to 2D images, as discussed 
in section 1, we first define the best-view of a 3 D object given a 2D image: 

Definition 1 : [best-view] Let d denote a difference measure between two 2D images of n 
features. Given a 2D image of an object composed of n features, the best-view of a 3D object 
(model) composed of n corresponding features, is the view for which the smallest value of d is 
obtained. The minimization is performed over all the possible views of the model; the views 
are obtained by applying a transformation T, taken from the set of permitted transformations 
A, and followed by a projection, II. 

We compute d, the difference between two 2D images of n features in two ways: 

image metric: we measure position differences in the image, namely, it is the Euclidean dis¬ 
tance between corresponding points in the two images, summed over all points. 

transformation metric: the images are considered to be instances of a single 3D object. 
The metric measures the difference between the two transformations that align the object 
with the two images. This difference can be measured, for instance, by computing the 
Euclidean distance between the matrices that represent the two transformations (when 
the two transformations are linear). 

As mentioned above, the measure d is applied to the given image and to the views of the 
given model. These views are generated by applying a transformation from a set A of permitted 
transformations. The view that minimizes the distance d to the image is considered as the best 
view, and the distance between the best view and the actual image is considered as the distance 
between the object and the image. 

We consider in this paper two families of transformations: rigid transformations and affine 
transformations, and we discuss the following metrics: 
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Ni m : a metric that measures the image distance between the given image and the best rigid 
view of the object. 

N a f. a metric that measures the image distance between the given image and the best affine 
view of the object. 

N tr : a transformation metric. We assume that the image is an affine view of the object. (When 
it is not, we substitute the image by the best affine view.) We look for the rigid view 
of the object so as to minimize the difference between the two transformations: the 
affine transformation (between the object and the image) and the rigid transformation 
(between the object and its possible rigid views.) In other words, we look for a view so 
as to minimize the amount of “affine deformations ” applied to the object. 


To illustrate the difference between image metrics and transformation metrics, Figure 1 
shows an example of three 2D images, whose similarity relations reverse, depending on which 
kind of metric is used. Consider the planar object in Figure 1(b) as a reference object, and 
assume A contains the set of rigid transformations in 2D. The images in (a) and (c) are obtained 
by stretching the object horizontally (by 9/7) and vertically (by 3/2) respectively. (The image 
in (b) is obtained by applying a unit matrix to the object.) 


closer in closer in 

transformation- image- 

space space 



(a) (b) (c) 


Figure 1: The 2D image shown in (b) is closer to the image in (a) when the difference is computed in 
transformation space, and closer to the image in (c) when the difference is the Euclidean difference between the 
two images. 


• The image metric between the images in (b) and (a) is 4, two pixel at each of the left 
corners of the rectangle. 

The image metric between the images in (b) and (c) is 2, one pixel at each of the upper 
corners of the rectangle. 

Therefore, according to the image metric, Figure (c) is closer to (b) than (a) is. 
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• To compute the transformation metric consider the planar object illustrated in (b). We 
compute the difference between the matrices that represent the affine transformation from 

(b) to both (a) and (c) and the matrix that represent the best rigid transformation (in 
this case it is the unit matrix): (a) is obtained from (b) by a horizontal stretch of 9/7. 
The transformation metric between (a) and (b) is therefore 2/7 = 9/7 — 1. 

(c) is obtained from (b) by a vertical stretch of 3/2. The transformation metric in this 
case is 1/2 = 3/2 — 1. 

Therefore, according to the transformation metric , Figure (a) is closer to (b) than (c) is. 

It is interesting to note that in this example the solution obtained by minimizing the transfor¬ 
mation metric seems to better correlate with human perception than the solution obtained by 
minimizing the image metric. 

3.1 Derivation of iV 8m and N a f 

We now define the rigid and the affine image metrics explicitly. Under weak-perspective pro¬ 
jection, the position in the image, g) = ( Xi,yi ), of a model point p) = Zj) following a 

rigid transformation is given by 

g t = II (Rpi + t) 

where R is a scaled, 3x3 rotation matrix, 1 is a translation vector, and II represents an 
orthographic projection. More explicitly, denote by r[ and rf the top two row vectors of R, 
and denote t = (t x ,t y ,t z )', we have that 

Xi = if • Pi + t x 

Vi = r\ -pi + ty (1) 

where 

r[ ■ r 2 = 0 

r[ -r i = rf ■ r 2 (2) 

The rigid metric, iV 8m , minimizes the difference between the two sides of Eq. (1) subject to the 
constraints (2). 

When the object is allowed to undergo affine transformations, the rotation matrix R is 
replaced by a general 3x3 linear matrix (denoted by A) and the constraints (2) are ignored. 
That is 


ft = II (Api + t) 


Denote by af and a^ the top two row vectors of A, we obtain 


Xi = af ■ pi + t x 
y t = a 2 ■ Pi + t y 


(3) 
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The affine metric, N a f, minimizes the difference between the two sides of Eq. (3). 

To define the rigid and the affine metrics, we first note that the translation component of 
both the best rigid and affine transformations can be ignored if the centroids of both model 
and image points are moved to the origin. In other words, we begin by translating the model 
and image points so that 

n n 

= ° (4) 

i =r i =r 

We claim that now 1 = 0. The proof is given in Appendix C. 

Denote 


'X r W Z x 


P = 


Y V 7 
A n ^r 

a matrix of model point coordinates, and denote 

X\ 


V r 


x = 


y = 


(5) 


,Vn 


the location vectors of the corresponding image points. A rigid metric that reflects the desired 
minimization is given by 

N m = min \\x - Pfi|| 2 + \\y - Pr 2 || 2 s.t. ff • r 2 = 0, ff • iq = ff • r 2 (6) 

n,r 2 eK 3 

The corresponding affine metric is given by 

N a f = min \\x — Par|| 2 + \\y — Pa 2 || 2 (7) 

ai ,a 2 6R 3 

In the affine case the solution is simple. We assume that the rank of P is 3 (the case for 
general, not coplanar, 3 D objects). Denote P + = (P T P) -1 P T , the pseudo-inverse of P; we 
obtain that 


ai = P + x 

«2 = P + y 


And the affine distance is given by 

N af = ||(7 - PP + )x || 2 + ||(7 - PP + )y\\ 2 


( 8 ) 

(9) 


Since the solution in the rigid case is significantly more difficult than the solution in the 
affine case, often the affine solution is considered, and the rigidity constraints are used only for 
verification (e.g. [Ullman and Basri, 1991, Weinshall, 1993, DeMenthon and Davis, 1992]). 
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The constraints (2) (substituting a 4 - for f), and using Eq. (8)) can be rewritten as 

x T (P + ) T P + y = 0 
x T (P + ) T P+x = y T (P + ) T P+y 


Denote 


B = (P+) T P+ (10) 

we obtain that 

x T By = 0 

x t Bx = y 1 By (11) 

where B is an n X n symmetric, positive-semidehnite matrix of rank 3. (The rank would be 
smaller if the object points are coplanar.) 

We call B the characteristic matrix of the object. B is a natural extension to the 3x3 
model-based invariant matrix defined in [Weinshall, 1993]. A more general definition, and its 
efficient computation from images, is discussed in Appendix B. 

3.2 Derivation of N tr 

We can now define a transformation metric as follows. Consider the affine solution. The nearest 
“affine view” of the object is obtained by applying the model matrix, P, to a pair of vectors, a\ 
and < 22 , defined in Eq. (8). In general, this solution is not rigid, and so the rigid constraints (2) 
do not hold for these vectors. The metric described here is based on the following rule. We are 
looking for another pair of vectors, iq and iq, which satisfy the rigid constraints, and minimize 
the Euclidean distance to the affine vectors iq, and < 22 - -Pf \ and Pr 2 define the best rigid view 
of the object under the defined metric. The metric, N tr , is defined by 

N tr = min ||<2i - fj|| 2 + ||a 2 - f 2 || 2 s.t. rf ■ r 2 = 0, r f • iq = ff • f 2 (12) 

ri,r 2 eK 3 

where a\ and <22 constitutes the optimal affine solution, therefore 

N tr = min ||P + T — rq|| 2 + \\P + y — r 2 || 2 s.t. rf ■ r 2 = 0, rf ■ r 1 = f f • r 2 (13) 

ri/ 2 6K 3 

In Section 5 we present a closed-form solution for this metric, and in Section 6 we show how 
this metric can be used to bound the image metric from both above and below. 


4 Summary of results 

In the rest of the paper we prove the following results: 
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4.1 Transformation space: 

The transformation metric defined in Eq. (13) has the following solution 

N tr = - (^x 1 Bx + y T By — 2 x T Bx ■ y 1 By — (x T By) 2 ^j 

where B is defined in Eq. (10), and x, y in Eq. (5). The best view according to this metric is 
given by 

x* = PP + (fi 1 x + f3 2 y) 
y* = PP + (j 1 x + 7 2 y) 
where [3 \, /3 2 ,7i, 72 are defined in Appendix D. 


4.2 Image space: 


Using N tr we can bound the image metric from both above and below. Denote 

N af = ||(7 - PP + )x || 2 + ||(7 - PP + )y\\ 2 

we show that 

N a f + Ai N tr A Nim < N a f + A ^N tr 

where Ai < A 2 < A 3 are the eigenvalues of P T P. A sub-optimal solution to Ni m is given by 


N, 


af 


-N- 


tr 


Ml + M2 

where the computation of ^ 1 , ^2 is described in Appendix E. A tighter upper bound is deduced 
from this sub-optimal solution 

Nim A N a j + H.M.{A 2 , A 3}Nt r < N a j + 2A 2 Nt r 


where H.M.{A 2 ,A 3 } = 


is the Harmonic mean of A 2 , A 3 . The sub-optimal solution is 


proposed as an initial guess for an iterative algorithm to compute N^ 


5 Closed-form solution in transformation space 

We now present a metric to compare between 3D models and 2D images under weak perspective 
projection. The metric is a closed-form solution to the transformation metric , N tr defined in 
Eq. (13). We use the notation developed in Section 3. B is the n X n characteristic matrix 
of the object, x, y G 7 Z n contain the x- and ^-coordinates of the image features. The metric is 
given by 

N tr = - (^x T Bx + y T By — 2^jx T Bx ■ y 1 By — (x 1 By) 2 ^j (14) 

This metric penalizes for the nonrigidities of the optimal affine transformation. Note that 
= 0 if the two rigid constraints in Eq. ( 11 ) are satisfied. Otherwise, N tr > 0 represents the 
optimal penalty for a deviation from satisfying the two constraints. 
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Derivation of the results: 


In the rest of this section we prove that the expression for N tr , given by Eq. (14), is indeed 
the solution to the transformation metric defined in Eq. (13). The proof proceeds as follows: 
Theorem 1 computes the minimal solution when ?q and r -2 are restricted to the plane spanned 
by (?i and « 2 > Theorem 2 extends this result to three-space. 



Figure 2: The vectors a\, c?2, r \, and m in the coordinate system specified in Theorem 1 . a\ and c?2 represent 
the solution for the affine case. fi and ?2 are constrained to be in the same plane with a.\ and a 2, to be orthogonal, 
and to share the same norm. 


Theorem 1: When ?q and r -2 are limited to span{a\, d^}, Ntr given by Eq. (14). 

Proof: We first define a new coordinate system in which 

(?i = ttq (1,0) 

a 2 = W 2 (cos $, sin 6 ) 
r 1 = .s (cos a , — sin ct) 

r 2 = .s(sin a, cos a) 

(see Figure 2). 0 is the angle between d\ and « 2 , w i and w 2 are the lengths of eti and T 12 
respectively, s is the common length of the two rotation vectors, ?q and and —a is the angle 
between d\ and ?q. Without loss of generality it is assumed below that 0° < 9 < 180° and 
— 90° < a < 90°. Notice that rtq, W 2 , and 0 are given and that s and a are unknown. 
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Denote / the term to be minimized, that is 

f(a , s) = 11- Fi || 2 + ||a 2 - F 2 || 2 

then 

f(a, s) = (w i — s cos a) 2 + s 2 sin 2 a + (s sin a — w 2 cos 9) 2 + (s cos a — w 2 sin 9) 2 
= wf + w\ + 2s 2 — 2s([n>i + w 2 sin 0] cos a + w 2 cos 9 sin ci) 

The partial derivatives of / are given by 

f a = 2s([u>i + W 2 sin 9] sin a — w 2 cos 9 cos a) 

f s = 4s — 2([u>i + W 2 sin 9] cos a + w 2 cos 9 sin a) 

To find possible minima we equate these derivatives to zero 


fa = 0 

fs = 0 

Solutions with s = 0 are not optimal. In this case /(a,0) = w\ + w^, and later we show that 
solutions with s > 0 always imply smaller values for /. 

When s / 0, /„ = 0 implies 


tan a 


min 


W 2 cos 9 
w i + W 2 sin 9 


therefore 


f s = 0 implies 


1 vji + W 2 sin 9 

^1 + (tan a mm ) 2 \l w \ + + 2wiw 2 sin 9 


s mvn _ -{[ Wl + w 2 sin 9] cos a mm + w 2 cos 9 sin o mm ) 


Notice the similarity of this expression to the expression for /. At the minimum point / can 
be rewritten as 


f mm = w 2 + w 2 - 2{s min ) 2 (15) 

(From which it is apparent that any solution for / with s / 0 would be smaller than the solution 
with s = 0.) Substituting for a mm we obtain 

s mm = ^([uq + w 2 sin 9} cosa mm + w 2 cos 9 sma min ) 

= - cos a mm (wi + w 2 sin 9 + w 2 cos 9 tan a mm ) 

) 

w i + W 2 + 2w\W2 sin 9 


w 1 + w 2 sin 0 w 2 cos 0 

— =(wi + w 2 sin 9 H ---: x 

2\ w\ + w? + 2 wiw 2 sin 9 n>i + w 2 sm 9 
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and therefore 


jmm _ V} 2 _|_ w 2 _ 2 ^mtn-^2 _ v} 2 _|_ _ ^2 _|_ _|_ 2 Wl -u; 2 s i n ^ 

or, 

f mm = ^( w i + w 2 ~ 2r/qr/q sin 0) 

Recall that w i and W 2 are the lengths of <q and c? 2 , that is 

vj\ = af • ai = x T Bx 
wl = -a 2 = y T By 

and 0 is the angle between the two vectors, namely 

r/qr/q sin 9 = \Jw\w^{ 1 — cos 2 6) = ^Jx T Bx ■ jp^By — (x 1 By) 2 

We obtain that 

f mm = i (x T Bx + y T By- 2^xTBx ■ y 1 By - (x T Byf^j 


□ 


In Theorem 1 we proved that if rq and rq are restricted to the plane spanned by a\ and iq, 
the metric N tr is given by Eq. (14). In Theorem 2 below we prove that any other solution for 
rq and v 2 results in a larger value for /, and therefore the minimum for / is obtained inside the 
plane, implying that N tr indeed is given by Eq. (14). 

Theorem 2: The optimal rq and rq lie in the plane spanned by <q and 0 , 2 - 


Proof: Assume, by way of contradiction, that rq,rq ^ sparrjai, iq}; we show that the 

corresponding value for / is not minimal. 

Consider first the plane spanned by rq and <q, and assume, by way of contradiction, that 
rq ^ sparrjrq, ai}; we show that there exists a vector r[ such that 


^1II = 11^2 

—*f I —* 

74 T rq 


and 


K — «i II < 


ri - a 1 


contradicting the optimality of /. 
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Assume 11r* 2 11 = s, and denote by r[ a vector with length s in the direction (r 2 X ai) X f 2 . 
This vector lies in span{r 2 ,ai} and satisfies 

r r T r 2 

(There exist two such vectors, opposing in their direction. We consider the one nearest to ai.) 
We now show that 

11 r[ — c?i 11 < 11 r*i — ai 11 

Denote the angle between a\ and r[ by a , and denote the angle between r[ and r*i by [3. Also, 
denote w\ = 11c?i11 and s = ||r*i|| = 11T* 2 11 = ||r^||. We can rotate the coordinate system so as to 
obtain 

= ^(l, 0,0) 
f 2 = s(0,l,0) 
ai = n>x(cos a, sin a, 0 ) 
r*i = s(cos/3, 0, sin/3) 

Now, 


/ \ o 9»9 99 

(s — w i cos a) + n> 1 sin a = w 1 + s — 2sw\ cos a 
(s cos /3 — w\ cos a) 2 + w\ sin 2 a + s 2 sin 2 /3 = w\ + s 2 — 2 sw cos a cos /3 

and therefore, when a / 0° and (3^0° (when f3 = 0°, fi and r[ coincide.) 

11 r[ — c?i 11 < 11 f*i — ai 11 

contradicting the minimality property. Therefore, f*i £ span{f* 2 ,«i}- Similarly, it can be shown 
that v 2 £ span{fi,a 2 }, therefore all four vectors a\, a 2 , fi, and r 2 lie in a single plane. 

□ 



Corollary 3: The transformation metric is given by 

N tr = i ^x t Bx + y T By — 2\Jx T Bx ■ y 1 By — (x 1 By) 2 

and the best view for this metric is 

x* = PP + (f3ix + (3 2 y) 
f = PP + (jix + y 2 y) 
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where 


Pi 


ih = 7i 


72 


1 + 


y T By 


^ y ^JxNBx ■ fp^By — (x 1 By) 2 
x T By 

2 ^Jx T Bx ■ fpBy — ( x T By ) 2 

1 / x T Bx 

^ y ^JxB'Bx ■ ypBy — (x 1 By) 2 ^ 


Proof: The expression for the metric immediately follows from Theorem 1 and 2. The 

expression for the best view is developed in the Appendix D. 

□ 

6 Solution in image space 

In order to compute the image metric as defined in section 3, we need to solve the constraint 
minimization problem defined in Eq. ( 6 ) 

N im = min \\x - Pfi|| 2 + \\y - Pf 2 \\ 2 s.t. r[ ■ r 2 = 0, r f • r i = r. f • r 2 

r*l ,r 2 e7l 3 

Section 6.1 shows that N tr , computed in the previous section, can be used to bound N{ m 
from both above and below. Section 6.2 describes a direct method to compute a sub-optimal 
approximation to N{ m and outlines an iterative algorithm to improve this estimate to obtain 
the optimal iV 8m . 

6.1 Bounding the image metric with the transformation metric 

In this section we show that using the transformation metric defined in Section 5 N tr , and the 
affine metric N a f (given in Eq. (9)), we can bound the image metric N{ m from both above and 
below. We prove the following theorem: 

Theorem 4: Let 0 < Ai < A 2 < A 3 denote the three eigenvalues of P T P, then 

Naf + Ai Ntr < Ni m < N aJ + A s,Ntr (16) 

Proof: Denote by and E) the vectors that minimize the term for the image metric given 

in Eq. ( 6 ), namely 

N im = \W-Pm 2 + \\y-Pr* 2 \\ 2 
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and denote by fi and f 2 the vectors that minimize the transformation metric given in Eq. (13), 
namely 

N tr = || P + x - fi|| 2 + || P + y- r 2 1| 2 

We start by showing the upper bound. Since Pf and minimize the term for lV 8m , we can 
write 

N im = ||f-P r -|| 2 + ||y-Prl|| 2 

< ||f — Efi|| 2 + ||y — Pf 2 || 2 

We now break each term in this sum into two orthogonal components as follows 

X — Pfi = (x — PP + X ) + ( PP + X — Pf\) 

for which it holds that 

(x — PP + x) T ■ ( PP + x — Pfi ) = 0 
The orthogonality readily follows from the identity 

(pp+)Xp = (p+)XpXp = p(prp)-l(pTp^ = p 

Since the two components are orthogonal it holds that 

||f — Pfi || 2 = ||f — PP + x || 2 + ||PP + f — Pfi || 2 

and, similarly, 

II y- Pr 2 1| 2 = || y- PP + y\\ 2 + \\PP + y- Pr 2 1| 2 

Therefore (recall that fi and r 2 minimize N tr and that A 3 is the largest eigenvalue of P T P) 
Nim < \\x ~ Pn\\ 2 + \\y - Pr 2 \\ 2 

= ||f - PP+f || 2 + ||PP+f - Pfi || 2 + || y- PP + y\\ 2 + || PP + y- Pf 2 1 | 2 
= ||(7 - PP+)f || 2 + ||(7 - PP+)y \| 2 + ||P(P+f - fi )|| 2 + || P(P + y- f 2 )|| 2 
= N af + ||P(P+f - fi )|| 2 + \\P{P + y- f 2 )|| 2 
< Naf + A 3 (||(P + f - fi )|| 2 + II (P + y- f 2 )|| 2 ) 

= N a f + A zNtr 

Next, we prove the lower bound. The proof is similar to the proof in the upper bound case, 
but this time we start by breaking up the terms into orthogonal components. Then we use the 
facts that fi and f 2 minimize N tr and that Ai is the smallest eigenvalue of P T P. 

N im = \\x-Pr{\\ 2 p \\y -Prl|| 2 

= ||f - PP+f|| 2 + \\PP + x - Pff\\ 2 + || y- PP + y\\ 2 + || PP + y- Pf)|| 2 

= ||(7 - PP+)f|| 2 + ||(7 - PP + )y\\ 2 + ll ^ + ^ - K )\\ 2 + II P(P + y~ m 2 
= N a} + ||P(P+f - r-)|| 2 + || P(P + y~ r 2 )\\ 2 
> N af + Ai(||(P + f - P*l)11 2 + || (P + y- r!)|| 2 ) 
f Naf + Ai Nt r 
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Consequently 


N a f + Ai N tr < < N a f + A ^Ntr 


□ 

6.2 Direct solution for the image metric 

In this section we develop tighter bounds on the image metric by direct methods, following 
the same steps we took in the derivation of the transformation metric in Section 5. Unlike for 
the transformation metric , we cannot obtain a closed-form solution for the image metric , but 
we can obtain a better estimator than we have previously described. This also enables us to 
develop an iterative method to compute the distance exactly. 

In section 6.2.1 we describe a change of coordinate system, arriving at a minimization 
problem which is similar to the one we had to solve for the transformation metric. The difference 
is that the sought vectors are constrained to lie on an ellipsoid rather than a sphere, and the 
ellipsoid is defined by a 3 X 3 positive-definite version of the characteristic matrix B. 

In section 6.2.2 we restrict the solution vectors, u, v, to lie in a plane with the data vectors, 
x,y and we derive the optimal solution under this constraint. The solution, however, is only 
sub-optimal, since in contrast to the transformation metric , the optimal solution in this case 
does not have to lie in the plane. Using this solution we derive a tighter upper bound on the 
optimal solution. 

In section 6.2.3 we describe the general problem that needs to be solved, and outline an 
iterative method. We propose the solution obtained in the plane as an initial guess for this 
method. 


6.2.1 Reducing the dimensionality of the problem 

In Section 6.1 we have shown that the image metric can be broken into two orthogonal terms, 
implying that 


N m = N af + || P(P + x - rl )|| 2 + || P(P + y~ r !)|| 2 (17) 

This property is useful for a direct computation of the image metric. The first term, N a f , does 
not depend on r\,r 2 . To compute jV 4 - m , therefore, only the second term needs to be minimized 

min \\PP + x — Pr^ 2 + \\PP + y — Pr 2 \\ 2 s.t. ff -72 = 0, ff ■ r i = r 2 • r 2 (18) 

ri,ri eR 3 


Note first that PP + x and PP + y , two vectors in R n , both lie in a single linear subspace of 
dimension 3. (This follows from the fact, shown in [Ullman and Basri, 1991], that every image 
of a 3 D object can be written as a linear combination of three independent views.) Moreover, 
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the three columns of P lie in the same subspace. It therefore follows that the vectors u = Pr*i 
and v = Pf *2 must also lie in this subspace. 

Denote X = PP + x and Y = PP + y, the projection of x and y to the column space of P, 
and denote u = Pfj and v = Pr- 2 . (Note that iq = P + u, r 2 = P + n, and B = (P + ) T P + , the 
characteristic matrix of the object.) We rewrite the problem as follows 

^min \\X — u\\ 2 + ||T — v\\ 2 s.t. u T Bij = 0, u T Bu = v T Bv (19) 

u,v£ 7Z n 


Since all the vectors, X, Y , u , and v, lie in a 3 D subspace (the column space of P) we can 
perform the minimization in 7Z 3 . To transform the system into 1Z 3 , we rotate the vectors and the 
characteristic matrix B so as to get nontrivial (nonzero) values only in three of the coordinates. 
Recall that distances and quadratic forms are invariant under rotation. The rotation matrix fl 
that should be applied to all terms is defined by the eigenvectors of B. Applying this matrix 
to B (in the form 0, T BQ.) results in a diagonal matrix with the three positive eigenvalues of B. 


6.2.2 Closed-form solution in the plane 

Theorem 5: When u and v are limited to span{X,Y}, the solution of Eq. (19) is given by 

N im = d-ih -2 _|_ jjT _ 2 JxTBx ■ t^By — (x T By) 2 '} (20) 

gl + g2 \ V J 

where ^/Jf\ < are the principal axes of the ellipse, dehned by the intersection of the ellipsoid 
B with the plane span{X,Y}. 


Note the similarity between this solution and N tr in Eq. (14). In fact, 


N { , 


- -L'tr 

gl + g2 


( 21 ) 


The proof closely follows the proof for N tr presented in Section 5 (Theorem 2). We therefore 
skip some of the details. 

We first define a new coordinate system in which 


X = nq^cos^^sini?) 
Y = w 2 {^/JE[cos sin 0) 

u = s(v / (pcos a, A/Injsin a) 
v = s( — y / jcj"sin a, -^/jijcos a) 



V 

0 

B13 

B = 

0 

j_ 

r -2 

B 2 3 


v Bl3 

B23 

B33 
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Proof: 



Without loss of generality it is assumed below that —90° < i] < 90°, rj < 0 < rj + 180°, and 
— 90° < a < 90°. Notice that w i, u >2 , i] and 0 are given and that s and a are unknown. 

Notice that this setting of coordinate system is similar to the one used in Theorem 1 with 
the exceptions that here u and v lie on an ehipse rather than on a circle, and that in general 
none of the points can be brought to lie on a principal axis. 

Denote by / the term to be minimized, that is 

f(a,s) = \\X - u\\ 2 + ||Y - v\\ 2 

then 

/(a,s) = /j>i(wicost] — scosa) 2 +/j> 2 (wis'mr) — ss'ma) 2 +/J>i(w 2 cos 0 + ss'ma) 2 + 

H 2 (u >2 sin 0 — s cos a) 2 

= wf(ji i cos 2 rj + H 2 sin 2 rj) + w%(fi\ cos 2 0 + ji 2 sin 2 6) + s 2 (ji\ + ^ 2 ) — 

2s{w\H\ cos rj cos a + w X ji 2 sin rj sin a — W 2 ji\ cos 0 sin a + W 2 M 2 sin 0 cos a) 


The partial derivatives of / are given by 

f a = 2s\fw\Hi cos r] + W 2 H 2 sin 0) sin a — (w X ji 2 sin rj — W 2 ji\ cos 0) cos ci)] 

f s = 2s(/ii + /i 2 ) — 2{w\ji\ cos r] cos a + w X ji 2 sin rj sin a — W 2 ji\ cos 0 sin a + W 2 M 2 sin 0 cos a) 

To find possible minima we equate these derivatives to zero 

fa = 0 

fs = 0 


Again, solutions with s = 0 can be ignored since they do not correspond to the global minimum 
(for a similar reason as in the proof of Theorem 1). 

When s f- 0, f a = 0 implies 

+ „ rain _ w lM2 sin lj - W 2 fll COS 0 

x cLU a — t 

w\fi i cos rj + W 2 H 2 sm 0 

f s = 0 implies 


s = 


Wifli COS Tf + W2fl2 sin 0 
cos a min (fii + 112) 


and, similarly to Eq. (15), 


jmm _ cos 2 rj -\- fi2 sin 2 rj) + w^ifii cos 2 0 + ji 2 sin 2 0) — (ji X + /U 2 )(s mm ) 2 


( 22 ) 


We substitute s mm and cos a m ™, using the identity cos a 
some manipulations, we obtain 



a 


, into Eq. (22) 


After 


Nim = f min = (wj + wl~ 2w x w 2 sin( 6 » - rj) 

Ml + M2 V 


(23) 
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Note that 


(PP+) T P(PP+) = (P + ) T P T (P + ) T P + PP + = (P + ) T (P+P) T (P + P)P+ = (P+) T P+ = B 

(24) 

from which it follows that 

w\ = X t BX = x T Bx 

w\ = Y t BY = y r By (25) 

W 1 W 2 cos (0 — T]) = X T BY = x T By 

We substitute the identities from Eq. (25) into Eq. (23), obtaining the expression for f\} m 
in Eq. (20). 


The derivation for gi and g2 is given in Appendix E. 

The sub-optimal solution in the plane can be used to improve the bounds on the image 
metric , which were previously discussed in Theorem 4. 

Theorem 6 : Let 0 < Ai < A 2 < A 3 he the three eigenvalues of P T P, then 

NafY^iNtr A Xi m < A aJ + H.M.{A 2 ,A 3 }iV tr ( 26 ) 

where H.M.{A2, A3} = i\i , the Harmonic Mean of A2, A3. 

Proof: The eigenvalues of the characteristic matrix B are and (This is shown 

in Appendix E.) Since 1/gi and l/fJ ,2 represent the eigenvalues of a section of B it holds that 
(see, e.g., [Strang, 1976] p. 270) 

11111 

— > — > — > — > — 

Ai gi A 2 ^3 

Using Eq. 21 we obtain that 

Ni m = — —-^—N tr = —- —N tr < —- —N tr = H.M.{A 2 , X 3 }X tr 

^ ^ Er + i i + i 

And, using Eq. 17 we obtain the upper bound 

Nim A N<i} + Ni m < N a f + H.M.{A2, A3} N tr 


□ 


Corollary 7: 

N a f + ^1 N tr < Ni m < N a f -\ - ! -— N tr 

Mi + M2 


( 27 ) 
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Note that, since H.M.{a, 6 } < 2min{a,6} for every a, b, we have the following corollary. 

Corollary 8 : 

X a f + Ai Nf r < Ni m < N a f + 2A2 Nf r 

We cannot yet improve the lower bound in theorem 4; but we conjecture that 

Conjecture 1: Let 0 < Ai < A 2 < A 3 be the three eigenvalues of P T P, then 

N af + H.M.{A 1? \ 2 }N tr < N vm < N af + H.M.{A 2 , \ 3 }N tr (28) 

Motivation: We know that if the two data points X,Y lie on the ellipse whose principal 

axes are of length Ai, A 2 (the smallest cross-section of the ellipsoid B ), then 

Nim = N a f + H.M.{Ai, A 2 }N tr 

We can show that this solution is a local minimum, namely, it is not possible to improve the 
solution by applying small perturbations to the solution vectors. 

□ 


6.2.3 An iterative optimal solution 

The solution we obtained in Theorem 5 is sub-optimal; it is not the lowest distance. We now 
give the cost function, a function of four variables, which should be minimized to obtain the 
precise value of the image metric. 

We first define a coordinate system such that 

X = wi(\/a 7 cos 9 cos //, \f~Xi cos 6 sin //, \/\^ sin 6) 

Y = W 2 (\/a7cos C c °s 77, Vx; cos C sin 77, ^sin () 
u = s(\/A7cos a cos /3, x/A^cos a sin /3, \/X 3 sin a) 

v = s(\/Ai"(sin f3 cos 7 + sin a cos [3 sin 7 ), \/X 2 (— cos [3 cos 7 + sin a sin [3 sin 7 ), 

— \/A 3 Cos a sin 7 ) 



where w 1, W2, Ai, A2, A3, (, 7 , 9 , and v are known, and s, a , (3 and 7 are free. 

Note that this setting of coordinate system is similar to the one used in Theorem 5, but 
now u and v lie on an ellipsoid rather than on an ellipse. 
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In this notation the free parameters are selected so as to satisfy the two rigid constraints, 
u T Bu = v T Bv and u T Bv = 0. To compute the image metric , the following function should be 
minimized. 

f(s, a, (3, 7 ) = Ai(s cos a cos (3 — w 1 cos 0 cos z /) 2 + \ 2 ( s cos a sin (3 — w\ cos 9 sin z /) 2 + 

As(s sin a — wi sin 9) 2 + 

Ai(s sin (3 cos 7 + s sin a cos (3 sin 7 — u >2 c °s ( cos r /) 2 + (29) 

A 2 ( — s cos f3 cos 7 + s sin a sin f3 sin 7 — W 2 cos ( sin r/) 2 + 

As( — s cos a sin 7 — W 2 sin £) 2 

N{ m is the global minimum of f(s, a , (3, 7 ). Assuming that f(s, a , /3, 7 ) is convex in the area 
that contains both the global minimum N{ m and the sub-optimal solution ( N a f + iV 8m ), we can 
employ the following iterative method to compute iV 8m : 

1 . compute Ni m ] 

2 . improve the solution by any gradient-descent method until a local minimum is obtained. 

If the convexity assumption is correct, this method returns the correct image metric , otherwise 
it may return a sub-optimal solution. 


7 Simulations 

To test the presented metric we have compared it with the alignment method. As was mentioned 
in Section 2 the alignment method involves the selection of a small subset of correspondences 
(alignment key), solving for the transformation using this subset, and then transforming the 
rest of the points and measuring their distance from the corresponding image points. The 
obtained distance critically depends on the choice of alignment key. Different choices produce 
different distance measures between the model and the image. The results are almost always 
sub-optimal, since it is usually better to match all points with small errors than to exactly 
match a subset of points and project the errors entirely onto the others. 

In our simulations, models composed of four points were projected to the image using weak 
perspective projection. Gaussian noise (with standard deviation 0.05 of the radius of the 3 D 
object) was added to the obtained images. Using the expression for N tr given in (14), we 
computed the upper and lower bounds on the image metric between the model to the noisy 
images. In addition, we computed the corresponding alignment distances , each reflecting the 
distance between one model point and its predicted projection in the image after the alignment 
of the remaining three image points to the model. 

The figures below summarize our results. Figure 3 shows the percentage of alignment 
distances which actually lie within the bounds on the image metric computed by our metric 
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condition number 


Figure 3: The percent of alignment distances which lie within the bounds on the image metric computed from 
our closed-form equations. The abscissa gives the condition number of the characteristic matrix, B, which 
determines how far apart the lower and upper bounds on the image metric are. The larger the condition number 
is, the further apart the bounds are. Solid graph: alignment distances relative to the wide bounds from Eq. (26). 
Dashed lines: alignment distances relative to the tight upper bound from Eq. (27). 


(given in Eq. (26)). It can be seen that when the bounds are relatively tight (when the condition 
number on the characteristic matrix B is relatively low) most of the alignment solutions 
exceed the upper bound. Only when the condition number gets larger do the alignment distances 
lie within the bounds. When a tighter upper bound is used (Eq. (27)), a smaller portion of the 
alignment distances actually lie within the bounds. 

Figure 4 shows the maximal and minimal alignment distances obtained in different runs 
relative to the upper and lower bounds on the image metric , given in Eq. (26) and Eq. (27). It 
can be seen that in many cases even the best alignment solution (the one that minimizes the 
distance) still exceeds the upper bound. 


8 Summary 

We have proposed a transformation metric to measure the similarity between 3 1) models and 
2D images. The transformation metric measures the amount of affine deformation applied to 
the object to produce the given image. A simple, closed-form solution for this metric has been 
presented. This solution is optimal in transformation space, and it is used to bound the image 
metric from both above and below. 


22 




(a) (b) 



(d) 



Figure 4: The maximal and minimal alignment distances are plotted for a number of models and objects, 
varying along the abscissa. The distances in these plots were normalized so as to obtain constant lower and 
upper bounds (the lower bound is set to 1; the upper bound is set to be the average ratio of the upper bound to 
the lower bound in each sequence of runs). Small (between 1.5 and 2.5) and large (between 4.5 and 5.5) condition 
numbers are used, and the results are compared to both the wide (Eq. (26)) and the tight (Eq. (27)) bounds, (a) 
Small condition number, wide bounds, (b) Small condition number, tight bounds, (c) Large condition number, 
wide bounds, (d) Large condition number, tight bounds. 
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The transformation metric presented in this paper can be used in several different ways in 
the recognition and classification tasks: 

1. It provides a direct assessment of the similarity between models and images. Measuring 
the amount of deformation applied to the objects makes it suited for the task of object 
classification where the uncertainty in the structure of the observed objects is inherent. 

2. The transformation metric can be used to bound the image metric , the distance between 
the image and the closest view of the object, from both above and below. As shown 
by our simulations, these bounds often provide better estimates than those provided by 
using alignment. Consequently, we believe that in many cases the bounds suffice to 
unequivocally determine the identity of the observed object. 

3. The transformation metric provides a sub-optimal closed-form estimate for the image 
metric. A scheme which uses this measure will prefer “symmetric” objects, objects whose 
convex-hull is close to a sphere, over other objects which are significantly stretched or 
contracted along one spatial dimension. This solution can also be used as an initial guess 
in an iterative process that computes the optimal value of the image metric numerically. 


Appendices 

A Metric properties 

The measures described in this paper compare entities of different dimensionalities: 3 D objects 
and 2D images. We define a metric for comparing such entities as follows. Let P be a set of n 
model points, and let g be a set of n corresponding image points. A distance function, N(P , q), 
defined using a difference function d(q,q r ) between two views (see Section 3), is called a metric 
if 

1. N(P , q) > 0 for every model P and image q. 

2. N(P , q) = 0 if, and only if, q is a rigid view of P. 

3. Vy, N(P , q) < N(P , q') + d(q - q') 

For the image metric , jV 4 - m , d is simply the Euclidean distance between corresponding points in 
the compared images. It is straightforward to see that the conditions hold for this case. In the 
rest of this appendix we prove that these conditions also hold for the transformation metric , 

N tr . 
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Transformation metric 


The transformation metric , N tr , measures the amount of “affine deformation” applied to the 
object in the image. The metric conditions for N tr are defined as follows. 

1. N(P , q) > 0 for every model P and image q. 

2. N(P,q ) = 0 if, and only if, there exists a rigid view which coincides with PP + q. (In 
other words, the best affine view of the object is a rigid view and there is no “affine 
deformation”.) 

3. Vy, N(P , q) < N(P , q') + \\P+(q - g')|| 

Theorem 9: N tr is a metric. 


Proof: 


1. N tr > 0. N tr minimizes a non-negative distance function. It is therefore always non¬ 
negative. 

2. N tr = 0 if, and only if, the best affine view is rigid. Denote x and y the x and y coordinates 
of the points in q, according to Eq. (14) 


N tr = 0 

<^=y (x T Bx + fj 1 By) 2 = 4 (x T Bx ■ y T By — (x 1 By) 2 ) 

<^=y (x T Bx) 2 + 2 (x T Bx ■ y T By) + (y T By) 2 = 4 (x T Bx ■ y T By) — 4(x T By) 2 

<^=y (x T Bx — y T By) 2 = —4 (x T By) 2 

This equation holds if, and only if, both sides are zero implying that 

x Bx = y By 
x T By = 0 

The best affine view of the object is given by PP + x,PP + y. Following Eq. (11), the best 
affine view also satisfies the rigidity constraints above, and therefore it forms a rigid view. 

3. The metric N tr is defined in Eq. (13) as: 

N tr (P,q)= min ||P + T — f*i|| 2 + \\P + y — r 2 || 2 s.t. rj ■ f 2 = 0, rj ■ rq = rj ■ r -2 

ri/ 2 eR 3 

Let wi and W 2 be the optimal vectors for qf that is 

N tr (P,q') = || P + x' - hq|| 2 + ||P+y - w 2 || 2 
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And we obtain 


N tr (P,q') + \\P+(q-q')\\ 

= \\P + x l — wi|| 2 + \\P + y' — w 2 || 2 + ||P + T — P + x l \\ 2 + \\P + y — P + ^ 11 2 

> 11P + a? — wi|| 2 + \\P + y — w 2 || 2 

> min ||P + T — P' 1 11 2 + \\P + y — P* 2 11 2 = A tr (P,g) 
n,r 2 eK 3 

B The computation of the characteristic matrix 

In Eq (10) the characteristic matrix P was defined using the matrix of Euclidean model point 
coordinates P. We now give a more general (though equivalent) definition of B using a matrix of 
affine model point coordinates Q. Namely, the point coordinates in Q are given in a coordinate 
system whose axes are not necessarily orthonormal. This definition makes it possible to compute 
B directly from three or more images with a completely linear algorithm, which requires no 
more than pseudo-inverse. 

We select an affine coordinate system whose independent axes are defined by three of the 
object points, to be called the basis points. Let Pf, as denote the submatrix of P corresponding 
to the coordinates of the basis points, and let Q denote the affine coordinates of all the object 
points in this basis. It immediately follows that: 

P = Q ■ Pbas 

Let Pfeas denote the characteristic matrix of the three basis points. From Eq (10) it follows that 

Bbas = ( P b -]) T P b ~ a ] (30) 

Finally, from the definition of pseudo-inverse it can be readily verified that 

P + = (Q • Pbas) + = P b ~ a ]Q + ( 31 ) 

We now describe P in terms of Q and Bf, as . Substituting Eq (31) into the definition of P 
in Eq (10), and using Eq (30), we obtain 

B = (P+f P+ = (Q+f ■ B bas ■ Q+ 

The linear and incremental computation of the matrices Q and P& a s from at least three 
images of the object points is described in [Weinshall and Tomasi, 1992]. 
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C Eliminating translation 


In this appendix we show that translation can be ignored if we set the centroids of both model 
and image points to be the origin. To show this, we prove that the best rigid and affine 
transformations maps the model centroid to the image centroid. We begin by showing that, 
given two sets of n 2D points (images), the best translation that relates the two images maps 
the centroid of the first image to that of the second. 


Lemma 10: Let pi,...,p n £ 1Z 2 and qi,...,q n £ 1Z 2 be two sets of corresponding points. 
Denote by p = ^ J2i=iPi an d q = y J2i= 1 hi the centroids ofpi,...,p n and q\,...,q n respectively. 
The translation t* £ 1Z 2 that minimizes the term 

n 

D* = min ^ \\p t + t - q t \\ 2 
2 = 1 


is given by 


t = q — p 


Proof: Assume, by way of contradiction, that the best translation is given by 

t' = t* + 6 


for some nonzero 8 £ 1Z 2 . Denote the new term by D' 

n 

D ' = + 

2 = 1 

n 

= ^2 11 Pi t* + 8 — qi 11 2 
i= 1 

= ^2 II Pi + t* ~ hi\\ 2 + 2 + t* ~ qi) ' 8 + ^2 H^l| 2 

2=1 2=1 2=1 

= D* + 2n(p+ t* - q) ■ 8 + n\\8\\ 2 


Since t* = q — p, we obtain that 


p + t* — q = 0 


and, therefore, 


which implies that 


D'= D* + n||^|| 2 
D* < D' 


contradicting the initial assumption. 

□ 


Using Lemma 10 we prove that the best rigid and affine transformations map the model 
centroid to the image centroid. 
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Theorem 11: Let Pi,...,P n G 7Z 3 be a set of n model points, and let qi,...,q n G 7 Z 2 be the 
corresponding n image points. The rigid transformation {s*, R *,/*} that minimizes the term 

n 

D* = min V ||sII_RPj- + t — qA \ 2 
L J 1 = 1 

where II denotes the orthographic projection, satishes 

q = s*HR*P + t* 


Proof: Denote by pi = s*HR*Pi] according to Lemma 10 

/ = q — p 

Since 

n n 

p = - V p t = - V s*UR*P t = s*UR*P 

n r— ' n r— ' 

2=1 2=1 

we obtain that 

g = p + P = s*HR*P + t* 

The theorem holds also if we consider affine transformations rather then only the rigid ones. 
The rotation matrix R is replaced in this case by a general linear transformation A. 

□ 


Theorem 11 shows that the best rigid and affine transformations map the model centroid to 
the image centroid. Consequently, if the two centroids are moved to the origin, the translation 
component vanishes. This follows immediately from Theorem 11, since 


then 


implies 


q = s*HR*P + t* 
P = q = 0 
/* = 0 


D Best View 


In this appendix we develop an expression for the best view of the transformation metric, N tr . 
The derivations here follow the notations used in the proof of Theorem 1, from which we have 
that 


s 

s cos a 

s sin a 


— \Jw 3 + w\ + 2 w\W 2 sin 6 

+ w 2 sin#) 

— w 2 cos 9 
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According to Theorem 2, 77,77 G span{ai,a 2 }. We can therefore express 77 and f *2 by 

Fi = /3iai + (3 2 a 2 
r 2 = 7iai + 72«2 

where /3i, (3 2 , 71 , and 72 are scalars. Substituting the definitions of the vectors 77 , f* 2 , ( 7 , and 
a 2 we obtain 


s cos a = /3in>i + (3 2 w 2 cos 9 
— 5 sin a = (3 2 w 2 s’m9 


and 


s sin a = 71 - 1/7 + y 2 w 2 cos 9 

■s cos a = y 2 w 2 sin 0 


Therefore 


Pi = 


■s sin a cos 9 + s cos a sin 9 


w 1 sin 9 


fh = - 


s sin a 


w 2 sin 0 

s sin ci sin 9 — s cos a cos 9 


7i = 


72 = 


w\ sin 9 


s cos a 


w 2 sin 0 


Substituting for s and a we obtain 


A = 


P 2 = 7i = - 


cos 9 
2 sin 9 


1 IDi . 

72 = 2 (1 + ^? 


And substituting for 7 / 7 , w 2 , and 9 


1 


/3i — x I 1 + 

@ 2=11 = - 


y T By 


^ y ^JxB'Bx ■ yTBy — (x 1 By) 2 
x T By 

2^jx T Bx ■ yTBy— (x 1 By) 2 

1 /_ x T Bx 

72 = x 1 H- / 

^ y yx 1 Bx ■ iff By — (x T By) 2 
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Now, to obtain the best view we use the following identities 

x* = Pf\ f*i = [i\a\ + ^ 2^2 

y* = Pf 2 f 2 = 7i«1 + 72®2 «2 

Therefore 


P + x 

P+y 


x* = PP + (/3iT + f3 2 y) 
f = PP + (7i* + 72f) 


E Computing the eigenvalues of an ellipse 


In this appendix we compute the eigenvalues of the ellipsoid B and the eigenvalue of an elliptic 
section of this ellipsoid. 

We first show that the eigenvalues of the characteristic matrix, P, are 7 ^, and 
where Ai, A 2 , and A 3 are the three positive eigenvalues of P T P. This is derived as follows. 

Ba=\a <*=*► P(P T P)“ 1 (P T P)“ 1 P T a = \a 
A A 

Multiplying both sides by P T we obtain that 

(P T P)- 1 P T a= \p T a 
A 

Denote b = P T a 

(P T P)- 1 b= \b 
A 

which implies that 

(P T P)b = Xb 

Given X = PP + x and Y = PP + y in P 3 , and a positive definite 3x3 matrix P, let B' 
denote the ellipse defined by the intersection of the ellipsoid P with the plane span{X ,Y}. We 
need to find the eigenvalues of B' — and —. 

Without loss of generality we assume that X and Y lie on the ellipsoid defined by P (namely, 
we normalize the vectors so that X T BX = x T Bx = 1 and Y T BY = y T By = 1). Let 6 denote 
the angle between X and Y. We define two orthonormal vectors x' and y 1 , which span the 
plane span{X , Y}, as follows: 


y 1 


x 


Y - x-Y y 
1 \x\ 

\Y \sin 9 
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Every vector v G span{X ,Y} can be written as 

v = ax' + ftif 

and the intersection ellipse B' is given by 


vB'v = 1 


a /3 )A t BA ( “ ) = 1 
VP/ 


for A the 3x2 matrix whose columns are x' and if. We therefore have that 


B' = A t BA = 


(x') T Bx' ( x') T Bif\ 
(x') T Bif {if f Bif ) 


Substituting the expressions for x' and if, we get 


(x') T Ex' 

(f) T Bf 

(x'fBf 


1 

w 

|X | 2 - 2|X||Y| cos 9{X T BY) + |E | 2 cos 2 9 
|X| 2 |Y| 2 sin 2 0 
(X t BY)\X\ - \Y\ cos 9 
|X| 2 |E| sin 9 


To obtain the two eigenvalues of B' and we solve the characteristic equation of B ', 
whose roots are 

|X | 2 + |T | 2 - 2|X||T| cos 9-k± y/( |X | 2 + |T | 2 - 2|X||T| cos 9 • k) 2 - 4|X| 2 |T | 2 sin 2 9{ 1 - k 2 ) 

2|X| 2 |T | 2 sin 2 9 

for k = X T BY = x T By , |X| = |PP + T|, |T| = |PP + y|, and cos 9 = . Viyi • 
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